Extracting and calculating effects
May be descriptive if it’s a sample-based estimate of a quantity. Or comparative if it estimates the relationship between two variables in a population, or an estimate of a quantity.
Also known as a outcome measure, treatments effect, or effect depending on the field.
Selecting effects
1: A difference in the means of groups.
2: A difference in binary outcomes between groups.
3: A continuous association between groups.
N.B Samples are always only samples, so measures always have error
Useful when quantifying a difference in the means of groups.
dat_yield %>%
group_by(treatment) %>%
summarise(mean = mean(yield),
sd = sd(yield),
n = n())
## # A tibble: 2 × 4 ## treatment mean sd n ## <chr> <dbl> <dbl> <int> ## 1 control 242. 9.72 65 ## 2 nitrogenous_fertilizer 259. 9.73 65
How does the addition of nitrogen-rich fertilizer affect the yield of wheat?
\(\Large g = 1.74\)
Interpretation. ~ish. Not really. (Cohen 1988):
For measuring a difference in binary outcomes between groups.
For measuring a difference in binary outcomes between groups.
Insects in patches that received a pesticide treatment had 7.25 times the odds of dying than those in the pesticide-free control.
Interpretation:
Pearson’s r
It’s the covariance between two variables, normalised by their standard deviation(s)
Interpretation:
Extracting effects
From best to worst-case:
metafor::escalc(): Calculate various effect sizes or outcome measures (and the corresponding sampling variances) that are commonly used in meta-analyses. When you have rich summary data.
compute.es: For calculating the most widely used effect sizes (ES), along with their variances, confidence intervals and p-values. When you need to convert among effects or impute from test statistics.Example: from data to g
….the addition of nitrogen-rich fertilizer had a strong positive effect on yield (control = 242 \(\pm\) 9.73, treat = 259 \(\pm\) 9.72 kg/ha).
metafor::escalc('SMD', m1i = 259, sd1i = 9.73, n1i = 65,
m2i = 242, sd2i = 9.72, n2i = 65)
## ## yi vi ## 1 1.7378 0.0424
Example: from r to g
…across the surveyed farms, we found a strong correlation between nitrogen content and crop yield (r = 0.62, n = 33).
compute.es::res(r = 0.62, n = 33)
## Mean Differences ES: ## ## d [ 95 %CI] = 1.58 [ 0.7 , 2.46 ] ## var(d) = 0.2 ## p-value(d) = 0 ## U3(d) = 94.3 % ## CLES(d) = 86.81 % ## Cliff's Delta = 0.74 ## ## Correlation ES: ## ## r [ 95 %CI] = 0.62 [ 0.35 , 0.79 ] ## var(r) = 0.01 ## p-value(r) = 0 ## ## z [ 95 %CI] = 0.73 [ 0.37 , 1.08 ] ## var(z) = 0.03 ## p-value(z) = 0 ## ## Odds Ratio ES: ## ## OR [ 95 %CI] = 17.58 [ 3.54 , 87.23 ] ## p-value(OR) = 0 ## ## Log OR [ 95 %CI] = 2.87 [ 1.26 , 4.47 ] ## var(lOR) = 0.67 ## p-value(Log OR) = 0 ## ## Other: ## ## NNT = 1.75 ## Total N = 33
Meta-analytic (statistical) models
We have:
Two goals:
Two goals:
The common-effects model assumes there is one true effect that underlies all the studies in the analysis. All differences in observed effects among individual studies are due to random sampling error alone.
The random-effects model assumes the true effect differs from study to study. Differences in observed effects among studies are due to random sampling error, as well as between-study heterogeneity in true effects. You’ll almost always use this.
escalc(): For effect size calculationrma(): The workhorse for linear meta-analytic modelling (fixed/random/mixed-effects)rma.mv(): For multilevel linear modelling (fixed/random/mixed-effects)
meta_warning in your datasets folderInspect the data
names(dat_warning)
## [1] "author" "year" "obs" "study" "group" "study_type" "class" "subclass" ## [9] "order" "suborder" "family" "genus" "species" "tox_measure" "col_var" "n" ## [17] "r"
Inspect the data
dat_warning[, 16:17]
## # A tibble: 122 × 2 ## n r ## <dbl> <dbl> ## 1 39 0.19 ## 2 36 0.22 ## 3 36 0.12 ## 4 36 0.38 ## 5 37 0.03 ## 6 37 0.22 ## 7 39 -0.09 ## 8 36 0.3 ## 9 36 0.21 ## 10 36 -0.14 ## # ℹ 112 more rows
Convert our correlation coefficient (Pearson’s r) to the more suitable Fisher’s z
dat_warning <- escalc(measure = 'ZCOR', ri = r, ni = n, data = dat_warning)
Have another look at the data
dat_warning[, 16:19]
## ## n r yi vi ## 1 39 0.19 0.1923 0.0278 ## 2 36 0.22 0.2237 0.0303 ## 3 36 0.12 0.1206 0.0303 ## 4 36 0.38 0.4001 0.0303 ## 5 37 0.03 0.0300 0.0294 ## 6 37 0.22 0.2237 0.0294 ## 7 39 -0.09 -0.0902 0.0278 ## 8 36 0.30 0.3095 0.0303 ## 9 36 0.21 0.2132 0.0303 ## 10 36 -0.14 -0.1409 0.0303 ## 11 37 -0.11 -0.1104 0.0294 ## 12 37 -0.19 -0.1923 0.0294 ## 13 30 0.41 0.4356 0.0370 ## 14 17 -0.03 -0.0300 0.0714 ## 15 25 -0.19 -0.1923 0.0455 ## 16 22 0.41 0.4356 0.0526 ## 17 27 0.61 0.7089 0.0417 ## 18 22 0.16 0.1614 0.0526 ## 19 27 0.56 0.6328 0.0417 ## 20 20 0.59 0.6777 0.0588 ## 21 20 0.63 0.7414 0.0588 ## 22 20 0.03 0.0300 0.0588 ## 23 20 0.02 0.0200 0.0588 ## 24 20 0.33 0.3428 0.0588 ## 25 20 -0.53 -0.5901 0.0588 ## 26 20 0.02 0.0200 0.0588 ## 27 20 0.25 0.2554 0.0588 ## 28 32 0.61 0.7089 0.0345 ## 29 32 0.57 0.6475 0.0345 ## 30 32 -0.02 -0.0200 0.0345 ## 31 32 0.29 0.2986 0.0345 ## 32 32 0.14 0.1409 0.0345 ## 33 32 0.22 0.2237 0.0345 ## 34 22 0.05 0.0500 0.0526 ## 35 22 0.15 0.1511 0.0526 ## 36 22 0.28 0.2877 0.0526 ## 37 22 0.42 0.4477 0.0526 ## 38 22 0.47 0.5101 0.0526 ## 39 23 0.27 0.2769 0.0500 ## 40 23 0.31 0.3205 0.0500 ## 41 23 0.15 0.1511 0.0500 ## 42 23 0.22 0.2237 0.0500 ## 43 23 0.26 0.2661 0.0500 ## 44 25 0.20 0.2027 0.0455 ## 45 25 0.09 0.0902 0.0455 ## 46 25 0.20 0.2027 0.0455 ## 47 25 0.37 0.3884 0.0455 ## 48 25 0.48 0.5230 0.0455 ## 49 9 -0.27 -0.2769 0.1667 ## 50 9 -0.33 -0.3428 0.1667 ## 51 9 -0.04 -0.0400 0.1667 ## 52 9 -0.15 -0.1511 0.1667 ## 53 9 -0.28 -0.2877 0.1667 ## 54 9 -0.37 -0.3884 0.1667 ## 55 11 0.78 1.0454 0.1250 ## 56 11 0.50 0.5493 0.1250 ## 57 11 0.48 0.5230 0.1250 ## 58 11 0.53 0.5901 0.1250 ## 59 11 0.08 0.0802 0.1250 ## 60 11 0.55 0.6184 0.1250 ## 61 10 0.78 1.0454 0.1429 ## 62 10 0.92 1.5890 0.1429 ## 63 25 0.10 0.1003 0.0455 ## 64 16 0.08 0.0802 0.0769 ## 65 16 0.40 0.4236 0.0769 ## 66 23 0.37 0.3884 0.0500 ## 67 23 -0.17 -0.1717 0.0500 ## 68 11 -0.45 -0.4847 0.1250 ## 69 11 -0.48 -0.5230 0.1250 ## 70 19 0.84 1.2212 0.0625 ## 71 24 0.83 1.1881 0.0476 ## 72 10 0.86 1.2933 0.1429 ## 73 21 0.70 0.8673 0.0556 ## 74 21 0.55 0.6184 0.0556 ## 75 21 0.53 0.5901 0.0556 ## 76 21 0.67 0.8107 0.0556 ## 77 18 0.71 0.8872 0.0667 ## 78 18 -0.44 -0.4722 0.0667 ## 79 18 0.25 0.2554 0.0667 ## 80 13 0.23 0.2342 0.1000 ## 81 13 0.42 0.4477 0.1000 ## 82 13 -0.18 -0.1820 0.1000 ## 83 10 -0.34 -0.3541 0.1429 ## 84 10 0.21 0.2132 0.1429 ## 85 10 0.19 0.1923 0.1429 ## 86 5 -0.95 -1.8318 0.5000 ## 87 5 -0.47 -0.5101 0.5000 ## 88 5 -0.26 -0.2661 0.5000 ## 89 5 0.25 0.2554 0.5000 ## 90 43 0.13 0.1307 0.0250 ## 91 43 0.04 0.0400 0.0250 ## 92 43 0.07 0.0701 0.0250 ## 93 43 0.13 0.1307 0.0250 ## 94 104 0.54 0.6042 0.0099 ## 95 104 0.29 0.2986 0.0099 ## 96 6 0.69 0.8480 0.3333 ## 97 6 -0.07 -0.0701 0.3333 ## 98 6 0.48 0.5230 0.3333 ## 99 6 0.51 0.5627 0.3333 ## 100 7 0.25 0.2554 0.2500 ## 101 14 0.22 0.2237 0.0909 ## 102 14 -0.58 -0.6625 0.0909 ## 103 14 -0.56 -0.6328 0.0909 ## 104 14 -0.61 -0.7089 0.0909 ## 105 12 0.05 0.0500 0.1111 ## 106 6 0.24 0.2448 0.3333 ## 107 6 -0.18 -0.1820 0.3333 ## 108 6 -0.17 -0.1717 0.3333 ## 109 6 -0.07 -0.0701 0.3333 ## 110 6 -0.01 -0.0100 0.3333 ## 111 18 -0.01 -0.0100 0.0667 ## 112 18 0.15 0.1511 0.0667 ## 113 18 0.22 0.2237 0.0667 ## 114 18 0.26 0.2661 0.0667 ## 115 18 0.19 0.1923 0.0667 ## 116 5 -0.67 -0.8107 0.5000 ## 117 5 0.77 1.0203 0.5000 ## 118 5 0.40 0.4236 0.5000 ## 119 5 0.42 0.4477 0.5000 ## 120 5 0.86 1.2933 0.5000 ## 121 8 0.83 1.1881 0.2000 ## 122 8 0.76 0.9962 0.2000
m_random <- rma(yi = yi, vi = vi, method = "REML", weighted = TRUE, data = dat_warning)
yi: The vector containing our effect sizesvi: The vector containing the associated variance for each effect sizemethod: REML for random-effects estimation (many options, but that’s the default & a good choice)weighted: Should effect sizes be weighted during estimation? (Default is TRUE, but we’ll be explicit).data: the dataset containing all thisTake a look at the results
m_random
## ## Random-Effects Model (k = 122; tau^2 estimator: REML) ## ## tau^2 (estimated amount of total heterogeneity): 0.0803 (SE = 0.0191) ## tau (square root of estimated tau^2 value): 0.2834 ## I^2 (total heterogeneity / total variability): 58.26% ## H^2 (total variability / sampling variability): 2.40 ## ## Test for Heterogeneity: ## Q(df = 121) = 280.1755, p-val < .0001 ## ## Model Results: ## ## estimate se zval pval ci.lb ci.ub ## 0.2414 0.0361 6.6852 <.0001 0.1706 0.3122 *** ## ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
N.B. This is still Fisher’s z, but we could transform back to r using metafor::transf.ztor()
Mean overall correlation between colour signal expression & toxicity under a random-effects model is z = 0.241 ± 0.037 [0.171, 0.312].
A forest plot displays the effect size estimates of individual contributing studies, as well as the overall mean ± variance. Can use the forest() function from metafor for a quick (and fairly ugly) plot.
# For a subset only forest(m_subset, slab = paste(dat_subset$author, dat_subset$year))
forest() function from metafor for a quick (and fairly ugly) plot.
Quantifying heterogeneity
Two goals:
Eg: We synthesise 32 experimental studies examining whether the presence of herbivores can help to prevent the establishment of newly-invasive plants. We find a moderate reduction in the survival of exotic plants in the presence of herbivores (SMD = 0.36 ± 0.08), and again high heterogeneity (\(I^2\) = 86%).
\(\tau\): The standard deviation of underlying effects across studies. Units = effect size.
\(\tau^2\): The variance of underlying effects across studies. Units = effect size.
\(I^2:\) The amount of variation in effect sizes that cannot be explained by sampling error. Units = 0 - 100%.
\(95\%\ CI\): The precision of the estimated mean effect (confidence interval).
\(95\%\ PI\): The dispersion of effects around the mean (prediction interval).
# Load our library library(metafor) # Load data on bcg and calculate effect size (odds ratio) bcg_data <- escalc(measure = "OR", ai = tpos, bi = tneg, ci = cpos, di = cneg, data = dat.bcg) # Run a random-effects model m_bcg <- rma(yi, vi, method = 'REML', data = bcg_data) # And inspect the result m_bcg
What is the overall mean effect of the BCG vaccine on tuberculosis prevalence? How heterogeneous are the effects between studies? (Where is our prediction interval?).
# Confidence and prediction intervals predict(m_bcg)
## ## pred se ci.lb ci.ub pi.lb pi.ub ## -0.7452 0.1860 -1.1098 -0.3806 -1.9412 0.4508
Forest plot for BCG data
Given all that heterogeneity, we may like to consider the influence of moderators via meta-regression, by extending our random-effects model to become a mixed-effects model. As part of the dataset, the authors also recorded the following information about each study:
ablat: absolute latitude of the study location (in degrees)alloc: method of treatment allocation (random, alternate, or systematic assignment)year: publication yearHypotheses?
Lets test our hypothesised moderator using a mixed-effects model. In metafor, we can specify this via:
bcg_me <- rma(yi, vi, mods = ~ ablat - 1, data = bcg_data)
And check out the results:
intrcpt: LOR is ~0 when ablat = 0 (i.e. treatment less effective/ineffective at the equator)ablat: Moderate, negative effect of ablat on LOR estimates. Estimated (average) log odds ratio becomes increasingly negative (i.e. treatment effect is greater) for study sites further from the equator.Yes, the sum-total evidence to date (well, ~1994) suggests so.
Yes, A moderate amount of variation is explained by the latitude of the population to whom the vaccine is administered.
Non-exhaustive, example sources of bias:
funnel(m_random)
The logic:
Thanks!